Dynamo AI Resource Provisioning Guidelines

This reference outlines recommendations on resource provisioning for the DynamoAI platform based on expected feature utilization and workloads. It ensures optimal performance across various use cases and scenarios.

Scaling Considerations

Dynamo AI platform resource recommendations are based on the following metrics:

Throughput: Number of requests per second
Guardrails: Number of guardrails applied per moderation request (DynamoGuard)

Throughput Scenarios

Below, we provide resource requirements for different throughput scenarios, ranging from < 1 QPS to 100 QPS. For context, we typically observe production workloads ranging from 0.1 - 10 QPS in our customer's AI use cases, however Dynamo AI can support peak workloads exceeding 250 QPS.

Example: For an AI use case with 100k global users, a throughput of 10 QPS equates to approximately 8-12 queries per user per day.

Scenario	Expected Throughput (QPS)	Use Cases
Development	1 QPS	Testing environments
Small	5 QPS	Lightweight production scenarios
Medium	10 QPS	Moderate-scale AI application
Large	50 QPS	High-demand production applications
Extra Large	100 QPS	Enterprise-scale, high-performance systems

Resource Guidelines

General Platform

Base platform resources are used for API and UI servers. The table below outlines recommended configurations:

Scenario	Recommended Resources	Example Cloud-Specific Details
Development	x32 vCPUs, 64GB memory	- AWS: x8 c7i.xlarge - Azure: x8 Standard_F4s_v2 - GCP: x8 c2d-standard-4
Small	x32 vCPUs, 64GB memory	- AWS: x8 c7i.xlarge - Azure: x8 Standard_F4s_v2 - GCP: x8 c2d-standard-4
Medium	x64 vCPUs, 128GB memory	- AWS: x16 c7i.xlarge - Azure: x16 Standard_F4s_v2 - GCP: x16 c2d-standard-4
Large	x128 vCPUs, 256GB memory	- AWS: x32 c7i.xlarge - Azure: x32 Standard_F4s_v2 - GCP: x32 c2d-standard-4
Extra Large	x128 vCPUs, 256GB memory	- AWS: x32 c7i.xlarge - Azure: x32 Standard_F4s_v2 - GCP: x32 c2d-standard-4

Note: This table is a general reference. You may need less resources than described in this table, because general platform components can run on the GPU nodes, using the shared vCPU and RAM. But provisioning this amount of resources guarantees the performance.

DynamoGuard Content Guardrails

DynamoGuard requires resources based on the number of guardrails applied to a workload. While CPUs can handle limited guardrails with higher latency, GPUs offer significantly improved latency (< 300ms). For lower latency when using CPUs, we recommend utilizing compute-optimized instances. Below, we provide the resource requirements for input content guardrails. For details around output content guardrails, please reach out to our team.

Tip: For non-development workloads, we recommend GPUs due to reduced latency and higher scalability.

Note: You should to calculate the number of GPUs you need based on your scenario and the number of policies you will deploy in the cluster.

Scenario	CPU Option	GPU Option	Example Cloud-Specific Instances
Development	x8 vCPUs, 8GB memory per guardrail	Same as Small scenario	- AWS: c7i.xlarge - Azure: Standard_F4s_v2 - GCP: c2d-standard-4
Small	Not Recommended	1 A10G GPU per 10 guardrails	- AWS: g5.2xlarge - Azure: NV36ads_A10_v5 - GCP: g2-standard-8
Medium	Not Recommended	1 A10G GPU per 6 guardrails	- AWS: g5.2xlarge - Azure: NV36ads_A10_v5 - GCP: g2-standard-8
Large	Not Recommended	1 A10G GPU per guardrail	- AWS: g5.2xlarge - Azure: NV36ads_A10_v5 - GCP: g2-standard-8
Extra Large	Not Recommended	2 A10G GPUs per guardrail	- AWS: g5.2xlarge - Azure: NV36ads_A10_v5 - GCP: g2-standard-8

Data Generation

Data generation is a required step in the custom content policy creation. Dynamo AI supports several external and in-cluster model configurations for data generation. If required, contact Dynamo support for additional model providers.

Option	Description	Cloud-Specific Details
Option 1	Azure Llama 3.1-8B: `azure_ai/Meta-Llama-3-1-8B-Instruct`	N/A
Option 2	AWS Llama 3.1-8B: `bedrock/llama/us.meta.llama3-1-8b-instruct-v1:0`	N/A
Option 3	GCP Llama 3.1-8B: `llama-3.1-8b-instruct-maas`	N/A
Option 4 (less performant)	In-cluster model: x1 A10G GPU + x8 vCPUs	- AWS: x1 g5.2xlarge - Azure: x1 NV36ads_A10_v5 - GCP: x1 g2-standard-8

Guardrail Fine-Tuning

Guardrail fine-tuning is a required step in the custom content policy creation. DynamoGuard offers two options for fine-tuning guardrails. Choose between SaaS fine-tuning or in-cluster fine-tuning based on your infrastructure.

Option	Description	Cloud-Specific Details
Option 1	Fine-tune on Dynamo SaaS environment and import policies into your cluster	N/A
Option 2	x1 A10G (or similar) GPU with x8 vCPUs, 32 GB RAM, and 24GB GPU memory	- AWS: x1 g5.2xlarge - Azure: x1 NV36ads_A10_v5 - GCP: x1 g2-standard-8

DynamoGuard Hallucination Guardrails

For hallucination guardrails, Dynamo supports both external and in-cluster configurations. If required, contact Dynamo support for additional model providers.

Option	Description	Cloud-Specific Details
Option 1	Azure-Open-AI-GPT-4o or Open-AI-GPT-4o	N/A
Option 2	In-cluster: Requires x3 A10G GPUs at 1 QPS. Additional GPUs scale linearly.	- AWS: x3 g5.2xlarge - Azure: x3 NV36ads_A10_v5 - GCP: x3 g2-standard-8

DynamoEval

DynamoEval requires the following resource configurations. The API endpoints are used for the data generation and judgement.

Requirement	Description	Cloud-Specific Details
API Endpoints	`mistral-small-latest`, `open-mistral-nemo`, `Open-AI-GPT-4o`	N/A
CPU and memory	x1 vCPU, 4GB memory	- AWS: x1 c7i.xlarge - Azure: x1 Standard_F4s_v2 - GCP: x1 c2d-standard-4

Scaling Considerations​

Throughput Scenarios​

Resource Guidelines​

General Platform​

DynamoGuard Content Guardrails​

Data Generation​

Guardrail Fine-Tuning​

DynamoGuard Hallucination Guardrails​

DynamoEval​